Web Page Genre Classification: Impact of n-Gram Lengths
نویسندگان
چکیده
منابع مشابه
Web Page Genre Classification: Impact of n-Gram Lengths
Web pages are discriminated based on their topic and genre. Web page genres are capable to improve the modern search engines to focus on the user's information need. In this paper, web pages are represented using character n-grams. Character n-gram representation is language independent and allows automatic extraction of features from a web page. Character n-gram representation of a web pa...
متن کاملPerformance Improvement of Web Page Genre Classification
The dynamic nature of web and with the increase of the number of web pages, it is very difficult to search required web pages easily and quickly out of thousands of web pages retrieved by a search engine. The solution to this problem is to classify the web pages according to their genre. Automatic genre identification of web pages has become an important area in web page classification, because...
متن کاملURL-Based Web Page Classification: With n-Gram Language Models
There are some situations these days in which it is important to have an efficient and reliable classification of a web-page from the information contained in the Uniform Resource Locator (URL) only, without the need to visit the page itself. For example, a social media website may need to quickly identify status updates linking to malicious websites to block them. The URL is very concise, and ...
متن کاملAn n-gram Based Approach to the Classification of Web Pages by Genre
The extraordinary growth in both the size and popularity of the World Wide Web has created a growing interest not only in identifying Web page genres, but also in using these genres to classify Web pages. The hypothesis of this research is that an n-gram representation of a Web page can be used effectively to automatically classify that Web page by genre. This research involves the development ...
متن کاملGenre Classification of Web Pages
Genre classification means to discriminate between documents by means of their form, their style, or their targeted audience. Put another way, genre classification is orthogonal to a classification based on the documents’ contents. While most of the existing investigations of an automated genre classification are based on news articles corpora, the idea here is applied to arbitrary Web pages. W...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2014
ISSN: 0975-8887
DOI: 10.5120/15412-3907